Probabilistic classifiers with high-dimensional data.
نویسندگان
چکیده
For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not "anticonservative" using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set.
منابع مشابه
Accurate Fault Classification of Transmission Line Using Wavelet Transform and Probabilistic Neural Network
Fault classification in distance protection of transmission lines, with considering the wide variation in the fault operating conditions, has been very challenging task. This paper presents a probabilistic neural network (PNN) and new feature selection technique for fault classification in transmission lines. Initially, wavelet transform is used for feature extraction from half cycle of post-fa...
متن کاملMeasuring the effect of nuisance variables on classifiers
In real-world classification problems, nuisance can cause wild variability in the data. Nuisance corresponds for example to geometric distortions of the image, occlusions, illumination changes or any other deformations that do not alter the ground truth label of the image. It is therefore crucial that designed classifiers are robust to nuisance variables, especially when these are deployed in r...
متن کاملFawzi, Frossard: Measuring the Effect of Nuisance Variables
In real-world classification problems, nuisance variables can cause wild variability in the data. Nuisance corresponds for example to geometric distortions of the image, occlusions, illumination changes or any other deformations that do not alter the ground truth label of the image. It is therefore crucial that designed classifiers are robust to nuisance variables, especially when these are dep...
متن کاملSUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS
This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...
متن کاملSpectral Embedding Based Probabilistic Boosting Tree (ScEPTre): Classifying High Dimensional Heterogeneous Biomedical Data
The major challenge with classifying high dimensional biomedical data is in identifying the appropriate feature representation to (a) overcome the curse of dimensionality, and (b) facilitate separation between the data classes. Another challenge is to integrate information from two disparate modalities, possibly existing in different dimensional spaces, for improved classification. In this pape...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Biostatistics
دوره 12 3 شماره
صفحات -
تاریخ انتشار 2011